Eliminating internal initiation of soluble CD4 gene

ABSTRACT

The present invention is based upon the discovery that proteins made from genes that include the CD4 sequence in its cDNA can make additional polypeptides as a result of internal translation initiation. This invention is thus directed to DNA sequences which eliminate internal initiation expression in sCD4.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. Ser No.08/013,828, filed Feb. 2, 1993, now abandoned, which is a file wrappercontinuation of U.S. Ser. No. 08/013,828, filed Feb. 2, 1993, nowpending; which is a continuation of internal application PCT/US91/04565,filed Jul. 2, 1991 now abandoned; which is a continuation of U.S. Ser.No. 07/562,861, filed Aug. 6, 1990, now abandoned.

FIELD OF THE INVENTION

This invention relates to expression of recombinant DNA sequences. Moreparticularly, it relates to expression of DNA sequences encoding solubleCD4 (sCD4) polypeptides in microbial hosts.

BACKGROUND OF THE INVENTION

CD4, a normal membrane component of the T4 lymphocyte, binds gp120, anenvelope glycoprotein of the human immunodeficiency virus (HIV). ThisRNA virus, which is responsible for acquired immune deficiency syndrome(AIDS) in humans, uses CD4 as its receptor for infection (Klatzmann, D.,et al. (1984) "Selective tropism of lymphadenopathy associated virus(LAV) for helper-inducer T lymphocytes." Science 225:59-63). CD4 has 4extracellular domains (Maddon, P., et at. (1985) "The isolation andnucleotide sequence of a eDNA encoding the T cell surface protein T4: Anew member of the immunoglobulin gene family." Cell 42:93-104). Asoluble molecule including some or all of these domains is referred toas sCD4. The two N-terminal domains of CD4 appear to be the mostimportant for gp120 binding and proteins which incorporate this gp120binding capability have been proposed as potential therapeutics for AIDSbecause they may target the protein to the virus, to HIV-infected cells,or to other species that might have exposed gp120 (Hussey, R., et al.(1988) "A soluble CD4 protein selectively inhibits HIV replication andsyncitium formation." Nature 331:768-81; Deen, K., et al. (1988) "Asoluble form of CD4 (T4) protein inhibits AIDS virus infection." Nature33 1:82-84; Traunecker, A., et at. (1988) "Soluble CD4 moleculesneutralize human immunodeficiency virus type 1." Nature 331:84-86;Berger, E., et al. (1988) "A soluble recombinant polypeptide comprisingthe amino-terminal half of the extracellular region of the CD4 moleculecontains an active binding site for human immunodeficiency virus." Proc.Natl. Acad. Sci. U.S.A. 85:2357-2361). The determinants for highaffinity binding of gp120 are in domain 1, residues 1-109 of CD4(Arthos, J., et al. (1989) "Identification of the residues in human CD4critical for the binding of HIV." Cell 57:469-481).

sCD4-PE40 is such a potential therapeutic agent for the treatment ofAIDS. (Chaudhary, V., et al. (1988) "Selective Killing of HIV-InfectedCells by Recombinant Human CD4-Pseudomonas Exotoxin Hybrid Protein."Nature 335:369-372). This hybrid protein consists of an N-terminalmethionine (amino acid 1) followed by the first two domains of CD4 (178amino acids), several linker amino acids, and the last two domains ofPseudomonas exotoxin A (amino acids 253-613 of the toxin). The resultingprotein contains 545 amino acids and has a calculated molecular weightof approximately 59,200 daltons. Amino acids 2-110 in sCD4-PE40(Chaudhary, supra) correspond to residues 3-111 in the cDNA sequence ofMaddon, supra, except that residue 3 of the Maddon sequence should belysine, and residues 1-109 (domain 1) of Arthos, supra. The gene forsCD4-PE40 has the sequence reported by Chaudhary, supra except that thecodons that correspond to the N-terminal portion of the protein havebeen modified as described for sCD4-183 in PCT Application No.PCT/US90/01367, and codon 179, corresponding to Ala, is GCT rather thanGCG.

Upon expression of sCD4-PE40 in E. coli, we have found a majorcontaminant which is immunologically-related to sCD4-PE40 and has amolecular mass of approximately 50,000 daltons. This protein has theN-terminal sequence Met-Leu-Val-Phe-Gly-Thr-Ala- which corresponds tothe C-terminal 449 residues of sCD4-PE40, i.e., beginning with Leu⁹⁷(preceded by a methionine). The 50,000 dalton protein results frominternal initiation within domain 1 of sCD4; a UUG codon down-stream ofpotential Shine-Dalgarno sequences is read as an initiation codon byf-Met-tRNA. Since the contaminant is closely related to the full lengthsCD4-PE40 product, it has similar biochemical properties. Accordingly,it co-purifies with the desired product and may interfere with theoxidation and folding of sCD4-PE40 to its biologically activeconformation.

Among the potential causes investigated for the impurity is internalinitiation. A gene including the above-described region of domain I withthe potential for internal initiation may generate an impurity with anN-terminal Met-Leu-Val-Phe-Gly-Thr-Ala- sequence. Such internalinitiation could result from translating a sCD4-containing gene in manyprokaraytic organisms, including but not limited to E. coli. Sinceproteins including sCD4 components are potential human drugs, it isdesirable to eliminate the cause of the contaminating protein.

Four sequence-related features appear to positively favor translationinitiation in prokaryotes. First, the preferred initiation codon is AUG.GUG and UUG can function as initiation codons although at only about 10and 1 percent of the frequency of AUG, respectively. (Hershey, J. (1987)Protein Synthesis. In "Escherichia coli and Salmonella typhimurium:Cellular and Molecular Biology". F. C. Neidhardt, et al., eds. (AmericanSociety for Microbiology: Washington, DC) p.613-641.) These codons arerecognized by f-Met-tRNA as the site where amino acid polymerization isto begin. (Gold, L. (1988) "Posttranscriptional Regulatory Mechanisms inE. coli." Ann. Rev. Biochem. 57: 199-233.)

The second feature that favors prokaryotic initiation is theShine-Dalgarno sequence, e.g. 5'-UAAGGAGGUGA-3', a sequence in the mRNAwhich is complementary to the 3' terminal sequence of 16s rRNA, suchthat base pairs can be formed to stabilize the initiation complex.(Shine, J., and Dalgarno, L. (1974) "The 3' terminal sequence of E. coli16s ribosomal RNA: Complementarity to nonsense triplets and ribosomebinding sites." Proc. Natl. Acad. Sci USA 71:1342-1346; Steitz, J., andJakes, K. (1975) "How ribosomes select initiator regions in mRNA: Basepair formation between the 3' terminus of 16s rRNA and the mRNA duringinitiation of protein synthesis in E. coli." Proc. Nail. Acad. Sci. USA72:4734-4738.) A variety of sequences which retain complementarity tothe 16s RNA can function in this role. Shine-Dalgarno-like sequences,usually include GGAG or GAGG, and typically are located about 5-13 basesupstream of the initiation codon for most effective initiation. (Gold,L., supra.)

The third feature is a region which facilitates ribosome binding andinitiation. A preferred pattern of nucleotides spanning at least -20 to+13 bases about the initiation codon of many E. coli genes has beendetected by in vitro analysis of ribosome protected sequences (Steitz,J., supra) and by statistical analysis (Stormo, G., et al. (1982)"Characterization of translational initiation sites in E. coli." NucleicAcids Res. 10:2971-2996; Schneider, T., et al. (1986) "InformationContent of Binding Sites on Nucleotide Sequences." J. Mol. Biol.188:415-431). Additionally, translation reinitiation can occur if atranslational start signal overlaps (Oppenheim, D., and Yanofsky, C.(1980) "Translational Coupling During the Expression of the TryptophanOperon of Escherichia Coli." Genetics 95:785-795) or follows one of thetranslational stop signals. (Steitz, J. (1979) "Genetic signals andnucleotide sequences in messenger RNA." In "Biological Regulation andDevelopment. 1. Gene Expression.") Such reinitiation does not require aShine-Dalgarno sequence and differs from the intragenic initiationdiscussed herein.

The fourth feature is the absence of significant mRNA secondarystructure in the initiation codon region that might block the necessaryannealing events with the 16s RNA or the initiator tRNA (Gold, L.(1988), supra).

The presence of potential translation initiation points can beidentified in several ways. First, the sequencing of the N-terminus ofimmunoreactive peptides should yield methionine for peptides resultingfrom initiation although in some cases, methionine aminopeptidase canremove methionine leaving the adjacent residue in the sequence at theN-terminus (Waller, J. (1963) "The NH₂ -terminal residue of the proteinsfrom cell-free extracts of E. coli." J. Mol. Biol. 7:483-496;Ben-Bassat, A., et al. (1987) "Processing of the initiation methioninefrom proteins: Properties of the E. coli methionine aminopeptidase andits gene structure." J. Bacteriol. 169:751-757). In that case, one mustrely on the gene sequence to determine if the terminal amino acid wasencoded with an adjacent codon capable of initiating translation. Codonswhich direct the insertion of the N-terminal Met can be AUG, GUG or UUG(Gold, L., supra). Secondly, one can analyze the gene for sequencesapproximating a good initiation region. Many of these sequences are notfunctional. (Stormo, G., and Schneider, T., supra). Translationinitiation points can be found through "footprinting" or "toeprinting"experiments in which regions of the mRNA to which ribosomes bind eitherare protected from nuclease digestion or block the elongation of aprimed, reverse-transcribed DNA copy. (Gold, L., supra.)

Intragenic ribosome initiation sites have been identified in a number ofgenes. Following expression in E. coli of poliovirus 3C protease,initiation at the AUG of codon 27 gave rise to significant levels of anunstable internal initiation product (Hanecak, R., et al. (1984)"Expression of a cloned gene segment of poliovirus in E. coli: Evidencefor autocatalytic production of the viral proteinase." Cell37:1063-1073; Ivanoff, L., et al. (1986) "Expression and site-specificmutagenesis of the poliovirus 3C protease in E. coli." Proc. Natl. Acad.Sci. USA 83:5392-5396). Furthermore, expression of xylanase in E. coliwas accompanied by the production of a species apparently initiating atGUG, codon 47 1. (Grepinet, O., et al. (1988) "Nucleotide sequence anddeletion analysis of the xylanase gene (xynZ) of Clostridiumthermocellum." J. Bacteriol. 170:4582-4588.) Translation initiationwithin the porcine parvovirus structural protein B occurs at internalinitiation sites, with at least two of these internal initiationpeptides produced at higher levels than the full length recombinantprotein. (Hailing, S., and Smith, S. (1985) "Expression in E. coli ofmultiple products from a chimaeric gene fusion: Evidence for thepresence of procaryotic translational control regions within eucaryoticgenes." Bio/Technology 3:715-720.) Finally, expression of a simianrotavirus glycoprotein in E. coli generated an apparent product ofinternal initiation at a level similar to that of the full lengthmolecule. (Arias, C., et al. (1986) "Synthesis of the outer-capsidglycoprotein of the simian rotavirus SA11 in E. coli." Gene 47:211-219.)It has been proposed that commercial production can be facilitated byremoving internal initiation sites through mutagenesis (Halling, S.,supra).

Once the cause of the impurity has been determined to be internalinitiation, a method of eliminating the internal initiation needs to bedeveloped.

INFORMATION DISCLOSURE

As discussed above, several intragenic ribosome initiation sites havebeen identified. Initiation at the AUG of codon 27 gave rise tosignificant levels of an unstable internal initiation product followingpoliovirus 3C protease expression in E. coli. See, e.g., Hanecak, R., etal. (1984) "Expression of a cloned gene segment of poliovirus in E.coli: Evidence for autocatalytic production of the viral proteinase."Cell 37:1063-1073 and Ivanoff, L., et al. (1986) "Expression andsite-specific mutagenesis of the poliovirus 3C protease in E. coli."Proc. Natl. Acad. Sci. USA 83:5392-5396. E. coli expression of xylanasewas accompanied by the production of a species initiating at GUG. See,e.g., Grepinet, O., et al. (1988) "Nucleotide sequence and deletionanalysis of the xylanase gene (xynZ) of Clostridium thermocellum." J.Bacteriol. 170:4582-4588. Translation initiation within the porcineparvovirus structural protein B occurs at internal initiation sites.See, e.g., Hailing, S. and Smith, S. (1985) "Expression in E. coli ofmultiple products from a chimaeric gene fusion: Evidence for thepresence of procaryotic translational control regions within eucaryoticgenes." Bio/Technology 3:715-720. Hailing and Smith suggest thatmutagenesis could remove internal initiation sites. Simian retrovirusglycoprotein expression in E. coli also generated a product of internalinitiation. See, e.g., Arias, C., et al. (1986) "Synthesis of theouter-capsid glycoprotein of the simian rotavirus SA11 in E. coli." Gene47:211-219 (1986). None of these references mention CD4 proteins.

SUMMARY OF THE INVENTION

The present invention provides DNA sequences that eliminate internaltranslation initiation and do not change the amino acid sequence fromgenes containing portions of sCD4. More specifically, the modifiedsequence comprises: ##STR1## This sequence may be modified by variouscodon substitutions, deletions, additions or replacements. All suchallelic variations and modifications resulting in a sCD4 protein inwhich internal translation initiation has been eliminated are includedwithin the scope of this invention.

The present invention further provides recombinant DNA molecules whichdo not support internal initiation of sCD4. The present invention alsoprovides host cells transformed with these recombinant DNA molecules.

The present invention also provides methods of eliminating internalinitiation of sCD4 which comprises substituting base sequences for theShine-Dalgarno-like sequences that precede the codon of amino acid 96 ofsCD4 and/or modifying codon 96 or other codons which can be recognizedfor translation initiation.

The term "Shine-Dalgarno-like sequences" as used herein means sequenceswith complementarity to the 3' end of 16s rRNA and which could be usedas a ribosome binding site. These can include but are not limited toGGAG, GAGG, AGGAGGT, GGAGG, and AAGGAGG.

The term "sCD4" refers to any protein or hybrid molecule that includessequences related to those in T cell CD4 and is capable of binding togp120, the external subunit of the HIV envelope glycoprotein. Suchmolecules are exemplified in Chaudhary, V., supra; Klatzmann, D., supra;Smith, D., supra; Fisher, R., supra; Hussey, R., supra; Deen, K., supra;Traunecker, A., supra; Berger, E., supra; Capon, D., et al. (1988)"Designing CD4 immunoadhesions for AIDS therapy." Nature, 337:525-531;and Till, M., et al. (1988) "HIV-infected cells are killed by rCD4-ricinA chain." Science 242:1166-1168; European Patent Application No.0,331,356. The term "hybrid molecule" refers to a molecule that containsfunctional components derived from two independent molecule species. Theindependent molecule species can be used in whole or in part to producethe "hybrid molecule ".

The term "host cell" as used herein means any procaryotic cell capableof being transformed with the modified DNA sequence encoding the firstdomain of an sCD4 molecule wherein internal initiation expression hasbeen eliminated without altering the amino acid sequence of the sCD4,including but not limited to E. coli.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is based upon the discovery that proteins madefrom genes that include the CD4 sequence in its cDNA form can makeadditional polypeptides because of an intragenic nucleotide sequencewhich favors translation initiation. The invention is thus directed to anovel method for preventing such initiation, particularly comprising amodified sequence which minimizes the potential for internal initiation.

sCD4-PE40 is a four domain hybrid protein. It consists of N-terminalmethionine, the first two domains of CD4 (178 amino acids), severallinker amino acids, and the last two domains of Pseudomonas exotoxin A(amino acids 253-613 of the toxin). The resulting protein contains 545amino acids and has a calculated molecular weight of approximately59,200 daltons. See Chaudhary, V., supra.

A variety of lower molecular weight species cross-reacting withantibodies to sCD4 have been found by Western Blot analysis in a varietyof E. coli strains producing sCD4-PE40. The major contaminant has amolecular weight of approximately 50,000 daltons and represented 5-20%of the level of sCD4-PE40 in isolated inclusion bodies. Although such aspecies could result from errors in biosynthesis, e.g., frameshifting ortermination, it seemed more likely to represent a proteolytic fragment.To test this hypothesis, it was necessary to identify the putative clipsite in order to develop approaches for eliminating the protease(s)responsible for the proteolysis.

The impurity protein was characterized by N-terminal sequence analysisfollowing isolation by electroblotting from SDS-PAGE or Reversed-PhaseHPLC. The amino acid sequence of the impurity lacked the first 96residues of sCD4-PE40; it began with N-terminal methionine and continuedwith the sequence starting at residue 97. The apparent molecular weightof 50,000 daltons observed on SDS-PAGE was in good agreement with thecalculated molecular weight of 48,375 daltons for such a fragment ofsCD4-PE40 comprising residues 96-545. In view of our originalhypothesis, the presence of N-terminal methionine on the protein wassurprising in that there are no known mechanisms for generating theidentified sequence from the intact protein by proteolysis.

Protein synthesis in E. coli is initiated with N-formyl methionine. TheN-terminal methionine is usually deformylated as the nascent peptidechain is elongated. Furthermore, depending on the adjacent amino acids,a methionyl amino peptidase often removes the N-terminal methionine. Thecleavage is inhibited by the adjacent lysine in sCD4-183 and sCD4-PE40.The presence of methionine at the N-terminus of the impurity and theobserved composition indicated that the impurity was not a proteolyticfragment but resulted from internal initiation at amino acid-96.

For initiation of protein synthesis, an initiation codon (usually AUG)is required. The presence of a Shine-Dalgarno-like sequence enhances theefficiency of initiation. To generate the observed impurity, aninitiation codon must be present at a position corresponding to aminoacid 96. The codon corresponding to Leu⁹⁶ is UUG. One of six codonsspecifying leucine, UUG is rarely found in the mRNA of highly expressedE. coli genes and the corresponding tRNA is found in low abundance. UUGis read, although infrequently (at <1% of normal initiation), by thef-Met-tRNA as an initiation codon. For this unusual initiation to occur,an upstream ribosome binding site is required. Inspection of thesequence encoding the impurity revealed three good Shine-Dalgarno-likesequences only five, eight and twenty nucleotides upstream of the UUG.Thus, internal translation was a reasonable explanation for the presenceof a subsequence of sCD4-PE40 beginning with Met-Leu⁹⁷.

A modified sCD4-PE40 can be constructed in which the leucine codon hasbeen changed from UUG to CUG and the GGAGG sequences have been changedto remove these Shine-Dalgarno-like sequences. These changes eliminateexpression of the internal initiation product but do not alter the aminoacid sequence of the full length sCD4-PE40 protein. Other similaralterations in sequences in this area will be readily apparent to thoseskilled in the art.

The present invention is exemplified in more detail in the examplesbelow.

EXAMPLE 1

In this example, we set forth the construction of cells and theirinduction to express sCD4-PE40 in E. coli.

The UC12656 strain of E. coli is used as the host for sCD4-PE40expression. This strain is derived from NRRL B-18303. The derivation ofthe NRRL B-18303 strain is described in International Application No.PCT/US88/0038 which is incorporated herein by reference. The UC12656strain is made in three steps which employ techniques well known tothose skilled in the art. First, NRRL B-18303 is crossed with an Hfrstrain to replace the rpoH112 allele with rpoH⁺. In addition this crossremoves a Tn10 adjacent to the rpoH locus, and introduces the rpsL100allele. Second, the NRRL B-18303 culture is resistant to lambda owing toan alteration in its lamb gene; a lamb⁺ allele is transduced into thestrain. Finally, a cryptic lambda lysogen from the strain TAP106(obtained from Dr. Donald Court, NCI-Frederick Cancer Institute,Frederick, Md. 21701; Chen, S., et al. (1990) "Expression andcharacterization of RNaseIII and Era proteins: Products of the rncoperon of Escherichia coli." J. Biol. Chem 265:2888-1895) is P1transduced into the strain to create UC12657. The lambda cryptic fromTAP106 contains the following genetic configuration: (int-ral)▴, N::Kan,cI857, (cro-bioA)▴.

A vector used to express the sCD4-PE40 protein is pUC1456. The vector isderived from pBR322 (available from Pharmacia LKB Biotechnologies,Piscataway, N.J. 08854) by cloning into the EcoRI and HindIIIrestriction sites a fragment containing the lambda P_(L) promoter, theTAT32 ribosome binding site, and the sCD4-PE40 gene. The P_(L) promoteris taken from the pJL-6 vector (Lautenberger, J., et al. (1983)"High-level expression in Escherichia coli of the carboxyl-terminalsequence of the avian myelocytomatosis virus (MC29) v-myc protein."Gene:75-84; the vector can be obtained from Dr. Donald Court). Thepromoter, ribosome binding site and the sCD4-PE40 gene are constructedand cloned using techniques that are well known to those skilled in theart. The P_(L) promoter is modified by introducing an XbaI restrictionsite shortly after the +1 nucleotide of the promoter. The modifiedpromoter is designated P_(L6m). The TAT32 ribosome binding site isderived from synthetic oligonucleotides that contained a sequencederived from the ribosome binding site of the bacteriophage T4 gene 32(Gorski, K., et al. (1985) "The stability of bacteriophage T4 gene 32mRNA: A 5' leader sequence that can stabilize mRNA transcripts." Cell43:461-469). The sCD4-PE40 gene is obtained from Chaudhary, V., supra,and modified by making changes in codon usage for several N-terminalcodons. The codons that correspond to the N-terminal portion of theprotein are modified as described for sCD4 in PCT Application No.PCT/US90/01367.

The pUC1456 vector is transformed into competent cells of UC12656. Theculture is developed from one of the transformed colonies and isdesignated UC12575.

UC12575 cells are grown at 30° C. and induced by heat shifting to 40° C.This results in the formation of intracellular aggregates (inclusionbodies) containing sCD4-PE40.

The vector pUC1456 and transformed culture UC12575 of Example 1 weredeposited at The Agricultural Research Culture Collection (NRRL),Northem Regional Research Center, 1815 North University Street, Peoria,Ill. 61604, under the Accession No. NRRL B-18667 on Jun. 27, 1990, inaccordance with the requirements of the Budapest Treaty on theInternational Recognition of the Deposit of Microorganisms for thePurposes of Patent Procedure.

EXAMPLE 2

This example describes the isolation and characterization of theimpurity that contaminates preparations of sCD4-PE40.

Samples containing sCD4-PE40 in inclusion body form are analyzed bySDS-polyacrylamide gel electrophoresis(SDS-PAGE), electroblotting onPVDF membranes, Western blotting, and sequencing according to methodsreadily apparent to those skilled in the art. In particular, the solidsfrom cells or inclusion body preparations are collected bycentrifugation. SDS-PAGE is performed essentially as described byLaemmli (Laemmli, U. (1970) "Cleavage of structural proteins during theassembly of the head of bacteriophage T4." Nature 227:680-685), exceptthat samples are heated at 100° C. for 5 minutes in ethanolamine samplebuffer (10 g SDS; 45 ml water; 20 ml 1M ethanolamine, pH 10; 25 mlglycerol; 10 ml 0.05% (w/v) Bromophenol Blue) for five minutes beforeapplication to gels. Following electrophoresis, these gels are rinsedimmediately, arranged in a blotting sandwich containing polyvinylidinedifluoride (PVDF) membranes (Immobilon, pore size 0.45 μm), and blottedelectrophoretically. This protein transfer to the PVDF membrane is bythe discontinuous semi-dry method of Hirano, H. (1989) "Microsequenceanalysis of winged bean seed proteins electroblotted fromtwo-dimensional gel." J. Protein Chem. (1):115-130. The blots are eithervisualized with anti-sera or with Coomassie Blue R250.

SDS-PAGE analysis of cells expressing sCD4-PE40 reveals a major band atabout 60,000 daltons corresponding to the recombinant product as well asa variety of other bands. When the inclusion bodies are separated fromsoluble proteins, the sCD4-PE40 is enriched, increasing from 10-20% ofthe total protein to 50-80%. The major impurity band, which has anapparent weight of about 50,000 daltons, is also greatly enriched. Bydensitometric scanning, this band is 5-20% of the sCD4-PE40 band.

Western analysis of the PVDF blots is conducted to determine if theobserved bands are related to sCD4-PE40. Immunodetection is accomplishedwith rabbit anti-sera to sCD4-183, to sCD4-PE40, and to Lys-PE40(Domains 2 and 3 of Pseudomonas exotoxin A). Many immunoreactive bandsare observed, including the major impurity. The band with an apparentweight of 50,000 daltons is immunoreactive with each antibody tested,indicating that it contains domains of both sCD4 and PE40.

For N-terminal sequence analysis, the PVDF membranes containing theblotted protein are stained with Coomassie Blue R250 for 2 minutes,destained with an aqueous solution of 50% methanol and 10% acetic acidfor 3 minutes, rinsed with Milli-Q water and air dried. The 50 Kdprotein band is excised from the dried blot, cut into approximately 2×4mm pieces, and loaded into the upper block of a sequencer cartridgeabove a Polybrene-loaded, precycled filter. N-terminal sequence analysisis performed on an Applied Biosystems (ABI) 470A sequencer equipped withan on-line ABI 120A PTH analyzer.

Following SDS-PAGE of sCD4-PE40 from inclusion bodies, the 50,000 daltonimpurity located with Coomassie Blue R-250 on a PVDF membrane wassequenced through 15 residues on the ABI 470A. Two sequences wereapparent, with the minor species presumably representingcross-contamination by the neighboring 60,000 dalton band since it hadthe N-terminus of sCD4-PE40 (e.g., Met¹ -Lys² -Lys³ -Val⁴ -Val⁵ -Leu⁶-Gly⁷). The most abundant sequence begins withMet-Leu-Val-Phe-Gly-Leu-Thr-Ala, corresponding to N-terminal methioninefollowed by Leu⁹⁷ through Leu¹¹⁰ of sCD4-PE40.

EXAMPLE 3

In this example, we set forth the design and construction of a syntheticDNA fragment from oligonucleotides to eliminate a Shine-Dalgarno-likesequence in sCD4-PE40.

The DNA sequence between the Bell and EcoNI restriction sites of theeDNA encoding sCD4-PE40 is shown below. These restriction sites areunique for the CD4-PE40 gene. The BclI-EcoNI fragment encompasses codons71 (ATC) through 111 (CAG) of the sCD4-PE40 sequence, which correspondsto codons 72-112 in the cDNA sequence determined by Maddon, supra.##STR2##

In the sequence preceding the codon for amino acid 96 there are threeregions that have strong homology to the so-called Shine-Dalgarnosequence (e.g. a sequence complementary to the 3' end of the 16s rRNA)and could be used as ribosome binding sites. These sequences areindicated below: ##STR3##

To disrupt these Shine-Dalgarno-like sequences the following basesubstitutions can be made without altering the amino acid sequence ofthe encoded protein. ##STR4##

In addition the TTG initiation codon can be changed from TrG to CTG. TheCTG codon is not a known initiation codon. Additional codon changes havebeen made in the DNA sequence to optimize the codon usage in thisregion. The use of codon optimization is known to those skilled in theart. These changes are shown above the native sequence. ##STR5##

The codon optimization changes in combination with the codon change forremoval 30 of the three ribosome binding sites and the TTG initiationsite are shown as a composite below. The codon changes are indicatedabove the native sequence. ##STR6##

Four oligonucleotides are synthesized as described in InternationalApplication No. PCT/US88/00328 which is incorporated herein byreference. These oligonucleotides, when hybridized and ligated, willform a synthetic fragment that contains the codon changes presentedabove and can be cloned into the unique BclI/EcoNI sites of the CD4-PE40gene thus replacing the native sequence that contains the intragenicribosome binding site. The sequences of the four oligonucleotides are:

1. 5' GATCATCAAGAACCTGAAGATCGAAGACTCTGATACCTACATCTGTGAAGTTGAAGA

2. 5' CCAGAAAGAAGAAGTTCAACTGCTGGTGTTTGGTCTGACTGCTAACTCTGACACTCACCTGC

3. 5' AGCAGGTGAGTGTCAGAGTTAGCAGTCAGACCAAACACCAGCAGTTGAACTTCTTCT

4. 5' TTCTGGTCTTCAACTTCACAGATGTAGGTATCAGAGTCTTCGATCTTCAGGTTCTTGAT

The crude oligonucleotides are purified by cutting out the product bandon a 20% acrylamide gel and desalting over a Waters Sep-Pak column asdescribed in PCT Application No. PCT/US88/00328. The oligonucleotidesform strands of the synthetic fragment as indicated below. ##STR7##

The procedures used can found in Current Protocols in Molecular Biology(edited by Ausubel, F., et al., and published by John Wiley and Sons).Oligonucleotides 2 and 4 are kinased using ³² P gamma labeled ATP.Oligonucleotides 1 and 4, and 2 and 3 are hybridized to each other. Thetwo set of hybridized oligonucleotides 1/4 and 2/3 are ligased and runon a 12% acrylamide gel. The synthetic oligonucleotide derived DNAfragment is visualized by autoradiograph, cut from the gel and isolated.The sequence of the synthetic DNA fragment is as follows: ##STR8##EXAMPLE 4

In this example we set forth the cloning of the BclI/EcoNI fragment intothe CD4-PE40 gene.

A detailed description of the cloning methodologies employed herein canbe found in Current Protocols in Molecular Biology (supra). Thesetechniques and the pBR322 vector use in the clonings described are wellknown to those skilled in the art.

The pUC1456 vector described in Example 1 contains BclI and EcoNIrestriction sites in the sCD4-PE40 gene and a second EcoNI restrictionsite resident in the pBR322 sequence downstream of the HindIIIrestriction site. The pUC1456 vector is transformed into the E. colistrain CGSC 6580 which had been lysogenized with the bacteriophagelambda. This strain carries the dam13::Tn9 allele (the strain can beobtained from Dr. Barbara Bachmann, Coli Genetic Stock Center,Department of Biology, 255 OML, Yale University, P.O. Box 6666, NewHaven, Conn. 06511-7444). The dam13::Tn9 allele prevents methylation ofthe adenine in the sequence GATC. Methylation of this site prevents theBclI restriction enzyme from cutting the DNA. The use of dam deficienthost to permit the BclI enzyme to cut is well known to those skilled inthe art. Vector DNA is isolated and digested with BclI and EcoNIrestriction endonucleases. This digestion produces a large vectorfragment, a 1837 bp fragment and a 115 bp fragment. The vector fragmentis isolated from an agarose gel, and is ligated to the syntheticoligonucleotide derived fragment and transformed into competent cells ofUC12656. The juncture formed between the EcoNI site of theoligonucleotide derived fragment of Example 3 and the EcoNI site in thevector generates a Pstl restriction site which can be used to identifycandidates with the oligonucleotide fragment inserted. One of thecandidates identified by restriction analysis is selected and thepresence of the oligonucleotide fragment insert is confirmed by DNAsequence analysis. The vector is designated pUC1470.

In order to reconstruct the sCD4-PE40 gene an additional vector pUC1469is constructed. The pBR322 vector contains EcoRI, ClaI, HindIII, EcoNIand NdeI restriction sites. Each one of these sites are unique in thevector. The CD4-PE40 gene can be cloned as a ClaI/HindIII fragment intothe corresponding ClaI/HindIII restriction site in the pBR322 vector.However, in such a vector the EcoNI site in the CD4-PE40 gene would notbe unique. To prevent this the pBR322 vector is cut with the HindIII andNdeI restriction enzymes. The "sticky" ends are filled with PolA Klenowfragment in the presence of dNTPs. The DNA is run on an agarose gel andthe fragment containing the ampicillin resistance gene and origin ofreplication is isolated, ligated and transformed into competent cells ofMC1061. Candidates are analyzed by restriction digestion, and a clonewith the deletion identified. The ligation of the HindIII and NdeIrestriction sites regenerates a HindIII site. This vector is designatedpUC1468. The ClaI/HindIII CD4-PE40 fragment is isolated from pUC1456 andis cloned into the pUC1468 vector at the corresponding ClaI/HindIIIsites. The resultant vector is designated pUC1469.

To regenerate the sCD4-PE40 gene, the pUC1470 vector is digested withEcoNI and EcoRI restriction enzymes and a fragment of approximately 600bp containing the P_(L6m). promoter, the TAT32 ribosome-binding site andthe 5' portion of the sCD4-PE40 gene with the modifications of theinternal ribosome-binding sites is isolated. The pUC1469 vector isdigested with EcoRI and EcoNI restriction enzymes to generate a vectorfragment and a fragment of approximately 600 bp. The vector fragment isisolated. The two isolated fragments are ligated and transformed intoUC12656. The DNA sequence derived from the oligonucleotide fragmentcontains an XmnI site that can be used for characterizing clones. Acandidate with the correct restriction analysis is identified. Thesequence modified by the cloning of the oligonucleotide fragment issequenced for confirmation. The resultant vector is designated pUC1467.This vector is transformed into UC12656, and the resultant culture,designated UC12657, is capable of high level expression of theunmodified sCD4-PE40 protein from the modified gene. The vector pUC1467and transformed culture UC12657 of Example 4 were deposited at TheAgricultural Research Culture Collection (NRRL), Northern RegionalResearch Center, 1815 North University Street, Peoria, Ill. 61604, underthe Accession No. NRRL B-18676 on Jul. 13, 1990, in accordance with therequirements of the Budapest Treaty on the International Recognition ofthe Deposit of Microorganisms for the Purposes of Patent Procedure.

EXAMPLE 5

In this example, it is shown that the use of a modified gene, such asdescribed in Example 4, eliminates the production of the 50 kilodaltonfragment which was described in Example 2.

Strain UC12657 containing pUC1467 is grown and induced as described forstrain UC12575 in Example 1. The cells are analyzed for sCD4-PE40 usingSDS-PAGE and Western blotting as described in Example 2. An examinationof the gel reveals the 50 kilodalton, immunoreactive species apparent inthe UC12575 culture is not detected in the induced UC12657, indicatingthat internal initiation has been eliminated.

We claim:
 1. A DNA molecule comprising a DNA sequence encoding the first domain of a soluble CD4 wherein said DNA sequence has been modified to eliminate the synthesis of a protein that initiates at the codon of amino acid 96 of soluble CD4 wherein the DNA sequence between the BclI and EcoNI restriction sites of soluble CD4 comprises: ##STR9##
 2. The DNA molecule comprising a DNA sequence encoding the first domain of a soluble CD4 wherein said DNA sequence has been modified to eliminate the synthesis of a protein that initiates at the codon of amino acid 96 of soluble CD4 wherein the modified DNA sequence eliminates Shine-Dalgarno-like sequence determined by gene sequence analysis to be GGAG, GAGG, AGGAGGT, GGAGG, or AAGGAGG, or by experiments to be complementary to the 3' end of 16s rRNA and to function as a ribosome binding site and potential initiation codons determined by gene sequence analysis to be the initiation codons AUG, GUG or UUG, or by footprinting or toeprinting experiments to function as an initiation codon.
 3. A recombinant DNA molecule according to claim
 1. 4. A recombinant DNA molecule according to claim
 2. 5. A host cell transformed with the DNA molecule of claims
 3. 6. A host cell transformed with the DNA molecule of claim
 4. 7. A host cell of claim 5 which is Escherichia coli.
 8. A host cell of claim 6 which is Escherichia coli.
 9. A method of eliminating internal initiation from a DNA sequence encoding the first domain of a soluble CD4 gene comprising modifying said DNA sequence to eliminate the synthesis of a protein that initiates at the codon of amino acid 96 of soluble CD4 wherein said modifying step comprises substituting the DNA sequence between BclI and EcoNI restriction sites of soluble CD4 with: ##STR10##
 10. A method of eliminating internal initiation at the codon of amino acid 96 of soluble CD4 gene comprising detecting the presence of any potential initiation codons in the soluble CD4 gene by gene sequence analysis or by footprinting or toeprinting experiments and substituting alternate base sequences in said potential initiation codons.
 11. An expression vector, useful for transforming an Escherichia coli cell and permitting the host cell to produce a soluble CD4 molecule in which internal initiation has been eliminated, which vector contains a P_(L6m) promoter, a TAT32 ribosome binding site and a 5' portion of the soluble CD4 gene, said vector being pUC1467.
 12. An Escherichia coli host cell comprising the vector of claim
 11. 13. The host cell of claim 12 which is NRRL B-18676. 